A cod fish, one of the top species captured by marine fisheries (8)

How did global fishing activity change during the Covid-19 pandemic?

2020 is a year we will never forget. Covid-19 spread rapidly across the globe and forced most of humanity into a state of quarantine. Covid-19 had clear devastating impacts on economies of all scales. Travel was heavily limited, and even when crossing country borders was possible, it was heavily monitored. However, the pandemic boosted some sectors of the economy and increased demand for certain goods. How did Covid-19 impact the fishing economy? Did fisheries respond to the pandemic by sending fishermen and fisherwomen home to quarantine, or did some countries see this as an opportunity to fish in the high seas more than ever before? I could not find any literature that answers this question, which is likely due to the fact that 2020 was less than a year ago, and any formal studies on this topic might not have had time to be published.

Regulating fishing and other vessel activities across the globe is a challenge in itself (4). Databases often have large gaps due to various causes such as a lack of reliable data from automatic identification systems and voluntary vessel registration by the owners of the vessels. Global Fishing Watch is an organization that aims to revolutionize the way we monitor fishing activity across the world using remote sensing techniques from satellites combined with automatic identification systems. Global Fishing Watch collects and visualizes global fishing data with the goal of embracing ocean sustainability, transparency, and open-source science. They keep track of vessels from all different countries, including their movements, boat types, and time stamps for fishing and docking at ports. Without such efforts to monitor, publicize, and regulate ocean activity, our marine resources are at high risk of depletion. On a global scale, we are fishing faster than fish stocks can naturally replenish. This has severe economic impacts; according to the World Bank Report, the ensuing depletion of marine fish stocks causes economic losses of 50 billion US dollars annually (4). With modern data science and applied statistics, we can better understand fishing activity on a global scale and protect our planet’s marine biodiversity.

As an aspiring wildlife biologist and data scientist, I’m interested in applying statistical analysis to Global Fishing Watch data to learn how different countries’ fishing effort changed in 2020, relative to those countries’ fishing trends in the years leading up to 2020. In this dataset, fishing effort is defined by the amount of hours spent fishing (3). I chose to use this dataset for my statistical analysis because it is already relatively clean, I know the data is relaible because Global Fishing Watch is a highly respected data resource with highly accurate remotely sensed data that is combined with locally collected automatic identification systems on shore, and I am interested in working with Global Fishing Watch and spatial data in the future. This data does not have spatial component since we treat countries as a categorical variable, and the temporal variable is limited to years. The only bias I believe might be present in this data is that it is limited to boats that either voluntarily allow their fishing hours to be tracked (such as through automatic identification systems) as well as boats that have been detected remotely by satellite.

With Global Fishing Watch’s expansive open-source data collection, we can approach this question by grouping all vessels’ fishing hours by country, identifying a statistical trend up until 2019, and extrapolating that trend into 2020. By comparing this 2020 prediction to the actual fishing data available for 2020, we can glean how Covid-19 skewed large-scale fishing efforts. I chose this analysis approach because I am familiar with these processes (besides the for loop aspect) through my graduate statisics course and I believe it will be the simplest and most accurate way to derive a p-value that will reveal if there is a statistically significant difference between each country’s actual mean fishing effort and their predicted mean fishing effort in 2020. Perhaps the global fishing economy sky-rocketed, plummeted into near nonexistence, or remained unscathed by the pandemic. Quantitative analysis will help provide some insight.

Global Fishing Watch offers an interactive map that displays fishing activity across the globe through a heat map. This visualization has the potential to inspire data scientists, fish enthusiasts, environmental justice advocates, pandemic researchers, and everyone in between to examine fishing activity during a time period of interest.

Global fishing activity from January 1, 2020 through January 1, 2021

Global Fishing Watch and their partners also provide an interactive map that allows users to interact with vessels across the globe, filter by country, and overlay port locations on coastlines.

# the tidyverse includes my go-to set of functions for data cleaning and wrangling
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.0.0     v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# lubridate helps us manage time stamps and annual trends
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.1.2
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
# gt helps make beautiful tables to summarize our data
library(gt)
## Warning: package 'gt' was built under R version 4.1.1
library(broom)
data = read_csv(file.path('data', 'fishing-vessels-v2.csv'))
## Rows: 114191 Columns: 28
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (7): flag_ais, flag_registry, flag_gfw, vessel_class_inferred, vessel_c...
## dbl (20): mmsi, vessel_class_inferred_score, length_m_inferred, length_m_reg...
## lgl  (1): self_reported_fishing_vessel
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Cleaning and Wrangling

Global Fishing Watch’s data includes fishing effort and vessel information from 124 countries over the years 2012-2020. First, we select our variables of interest, group by country, and take the fishing effort means per year (3).

# clean the data, selecting only relevant column of fishing hours and taking the means by year for each country

effort_trends <- data %>% 
  select(flag_gfw, 
         fishing_hours_2012,
         fishing_hours_2013,
         fishing_hours_2014,
         fishing_hours_2015,
         fishing_hours_2016,
         fishing_hours_2017,
         fishing_hours_2018,
         fishing_hours_2019,
         fishing_hours_2020) %>% 
  group_by(flag_gfw) %>% 
  summarize("2012" = mean(fishing_hours_2012, na.rm = TRUE),
            "2013" = mean(fishing_hours_2013, na.rm = TRUE),
            "2014" = mean(fishing_hours_2014, na.rm = TRUE),
            "2015" = mean(fishing_hours_2015, na.rm = TRUE),
            "2016" = mean(fishing_hours_2016, na.rm = TRUE),
            "2017" = mean(fishing_hours_2017, na.rm = TRUE),
            "2018" = mean(fishing_hours_2018, na.rm = TRUE),
            "2019" = mean(fishing_hours_2019, na.rm = TRUE),
            "2020" = mean(fishing_hours_2020, na.rm = TRUE))

Our goal is to run a linear regression on each country’s fishing effort over multiple years, but many countries have NA data for certain years. Considering that we have data available for 2012-2020, which years should we choose? We want to select a chunk of continuous years leading up to 2020 with minimal data gaps. We want to minimize the amount of NA values because we will drop all rows with NA values, and we want to maintain the maximum amount of rows (which represent vessels) and countries as possible. In order to choose the start year for the time period that we will feed into the linear regression, we’ll take a look at the amount of NA values in the years leading up to 2020. It turns out that 2017 has the least amount of NA values, so we will use that year to start our 3-year data period to feed to the linear regression. Next, we convert the data into Tidy format and remove NA values so we can run a time series linear regression analysis.

# only need to look at 2012 - 2017 rather than 2012 - 2019 because we want a few years of data to plug into the linear regression
sum(is.na(effort_trends$"2012"))
## [1] 83
sum(is.na(effort_trends$"2013"))
## [1] 71
sum(is.na(effort_trends$"2014"))
## [1] 60
sum(is.na(effort_trends$"2015"))
## [1] 52
sum(is.na(effort_trends$"2016"))
## [1] 42
sum(is.na(effort_trends$"2017"))
## [1] 37
# change it to tidy format using pivot_longer()
# remove all NA values, and take out the year 2020 because we want to compare what we would EXPECT in 2020 based on what we saw in 2017-2019

effort_trends_tidy_no_na = effort_trends %>%
  select(flag_gfw, "2017":"2019") %>% 
  pivot_longer(cols = ("2017":"2019"),
               names_to = "year",
               values_to = "mean_effort") %>% 
  filter(!is.na(mean_effort),
         !is.na(flag_gfw))

Our dates are in years, and currently their class is character in the original dataset. We need these years in date format in order to run a linear regression over time. We will convert these years and remove all countries that only have data for 1 or 2 years, because we need multiple years of data to feed into the regression and we want each country to have equal amounts of data and start in the year 2017.

# define day as Jan 1 so that when we convert the year to a date we get the first of the year so the plot looks better later on! Otherwise, R will paste TODAY'S date at the end of each year, which will skew the x axis when we plot later

month_day <- "-01-01"
effort_trends_tidy_no_na_date <- effort_trends_tidy_no_na %>% 
  mutate(year = paste0(year, month_day))

# remove those countries from the dataframe
countries_clean <- effort_trends_tidy_no_na_date %>% 
  group_by(flag_gfw) %>%
  filter(n()>2) %>% 
  mutate(year = as.Date(year, format = "%Y-%m-%d"))

Linear Regression

Now that the data is sufficiently clean and our years are of class date, we can run a time series linear regression on every country’s fishing effort from 2017-2019 and use the output coefficients to glean which direction each country is trending, meaning if the country is fishing more or less over time. We can do this with the do() function, grouping by country. We can set the function to output all the model coefficients as a list. Then we can feed this output into a for loop! We can plug in each country’s fishing effort intercept and slope coefficients into a linear equation to predict the fishing effort in 2020 based on that country’s historical trend. Subsequently, we can combine the predicted 2020 fishing effort data with the actual 2020 fishing effort data into a single dataframe to compare by country. We can make a new column that takes the difference of the actual and predicted values, and then add a column that explicitly states whether that country increased or decreased their fishing effort in 2020 relative to their trend leading up to 2020.

# countries_clean %>% 
#   group_by(flag_gfw) %>% 
#   do(data.frame(., as.list(coef(lm(mean_effort~year, .)))))

# try to adjust the code that worked earlier to be like this code
models <- sapply(unique(as.character(countries_clean$flag_gfw)),
                 function(country)as.numeric(coef(lm(mean_effort~year, countries_clean, subset = (flag_gfw == country)))),
                 simplify = FALSE, USE.NAMES = TRUE)
# #models[[4]]
# #models
# 
prediction_data = NULL;
for (i in 1:length(models)) {
  predicted_effort_2020 <- models[[i]][1] + models[[i]][2]*3
  prediction_data <- rbind(prediction_data, predicted_effort_2020)
  print(paste0("In 2020, the predicted fishing hours is ", predicted_effort_2020))
}
## [1] "In 2020, the predicted fishing hours is -714.630617960425"
## [1] "In 2020, the predicted fishing hours is 21323.8139148635"
## [1] "In 2020, the predicted fishing hours is -15964.5013093548"
## [1] "In 2020, the predicted fishing hours is 3063.43204190972"
## [1] "In 2020, the predicted fishing hours is 3920.82121431707"
## [1] "In 2020, the predicted fishing hours is 128.23614639422"
## [1] "In 2020, the predicted fishing hours is -5260.57026370865"
## [1] "In 2020, the predicted fishing hours is 1517.044911641"
## [1] "In 2020, the predicted fishing hours is -4621.67579709532"
## [1] "In 2020, the predicted fishing hours is -8513.62224455465"
## [1] "In 2020, the predicted fishing hours is -981.574775032544"
## [1] "In 2020, the predicted fishing hours is -264.498452521068"
## [1] "In 2020, the predicted fishing hours is 29.7043865127327"
## [1] "In 2020, the predicted fishing hours is 2547.66461187216"
## [1] "In 2020, the predicted fishing hours is -9202.21302054795"
## [1] "In 2020, the predicted fishing hours is -48305.5353603756"
## [1] "In 2020, the predicted fishing hours is 8751.59463203964"
## [1] "In 2020, the predicted fishing hours is -6001.90685464233"
## [1] "In 2020, the predicted fishing hours is 1283.96323378996"
## [1] "In 2020, the predicted fishing hours is 2076.14289036648"
## [1] "In 2020, the predicted fishing hours is 2095.36140210988"
## [1] "In 2020, the predicted fishing hours is 2196.79044520795"
## [1] "In 2020, the predicted fishing hours is 1121.5678700261"
## [1] "In 2020, the predicted fishing hours is -659.442750983932"
## [1] "In 2020, the predicted fishing hours is -614.712096534242"
## [1] "In 2020, the predicted fishing hours is -622.188796803659"
## [1] "In 2020, the predicted fishing hours is 2440.04259718047"
## [1] "In 2020, the predicted fishing hours is -212.889099692058"
## [1] "In 2020, the predicted fishing hours is 11353.6736253739"
## [1] "In 2020, the predicted fishing hours is 5379.04083328041"
## [1] "In 2020, the predicted fishing hours is 19892.2039631442"
## [1] "In 2020, the predicted fishing hours is 12432.3616601726"
## [1] "In 2020, the predicted fishing hours is 360.147009683571"
## [1] "In 2020, the predicted fishing hours is -9040.37144824968"
## [1] "In 2020, the predicted fishing hours is -7878.83460609789"
## [1] "In 2020, the predicted fishing hours is -1804.83305205488"
## [1] "In 2020, the predicted fishing hours is 4633.1340082446"
## [1] "In 2020, the predicted fishing hours is -6392.7980578387"
## [1] "In 2020, the predicted fishing hours is 2084.29252418284"
## [1] "In 2020, the predicted fishing hours is 793.523593267236"
## [1] "In 2020, the predicted fishing hours is 10011.5469946728"
## [1] "In 2020, the predicted fishing hours is 1630.64687100459"
## [1] "In 2020, the predicted fishing hours is 342.940351598192"
## [1] "In 2020, the predicted fishing hours is 2463.04742303282"
## [1] "In 2020, the predicted fishing hours is -3226.26850813617"
## [1] "In 2020, the predicted fishing hours is 2454.32945813705"
## [1] "In 2020, the predicted fishing hours is 3036.96936884982"
## [1] "In 2020, the predicted fishing hours is 11382.2820722007"
## [1] "In 2020, the predicted fishing hours is -696.202352791085"
## [1] "In 2020, the predicted fishing hours is -1617.38884173934"
## [1] "In 2020, the predicted fishing hours is 1911.57505901916"
## [1] "In 2020, the predicted fishing hours is 8392.45706642112"
## [1] "In 2020, the predicted fishing hours is -36595.9786133945"
## [1] "In 2020, the predicted fishing hours is 23212.5189251469"
## [1] "In 2020, the predicted fishing hours is -1532.3511993912"
## [1] "In 2020, the predicted fishing hours is 2919.39457993244"
## [1] "In 2020, the predicted fishing hours is 6870.78853652997"
## [1] "In 2020, the predicted fishing hours is -4531.47638305432"
## [1] "In 2020, the predicted fishing hours is -21321.4544392909"
## [1] "In 2020, the predicted fishing hours is 2544.00944098046"
## [1] "In 2020, the predicted fishing hours is 4842.51970070318"
## [1] "In 2020, the predicted fishing hours is -6839.74369586345"
## [1] "In 2020, the predicted fishing hours is 1552.75177853496"
## [1] "In 2020, the predicted fishing hours is 6532.87900258756"
## [1] "In 2020, the predicted fishing hours is 4260.15240128292"
## [1] "In 2020, the predicted fishing hours is -11084.2763337188"
## [1] "In 2020, the predicted fishing hours is 33638.2508125572"
## [1] "In 2020, the predicted fishing hours is 5336.37900472927"
## [1] "In 2020, the predicted fishing hours is 5970.28642638358"
## [1] "In 2020, the predicted fishing hours is 889.64467841351"
## [1] "In 2020, the predicted fishing hours is -3075.15478425483"
## [1] "In 2020, the predicted fishing hours is 19089.6613661812"
## [1] "In 2020, the predicted fishing hours is -30911.9529894978"
## [1] "In 2020, the predicted fishing hours is 13660.3968592086"
## [1] "In 2020, the predicted fishing hours is 6932.44249590471"
## [1] "In 2020, the predicted fishing hours is 1082.44871161888"
## [1] "In 2020, the predicted fishing hours is 24871.8567711822"
## [1] "In 2020, the predicted fishing hours is 5128.10923634413"
## [1] "In 2020, the predicted fishing hours is -12579.6721244088"
## [1] "In 2020, the predicted fishing hours is -1021.0202325802"
## [1] "In 2020, the predicted fishing hours is 6055.03879174114"
## [1] "In 2020, the predicted fishing hours is 7547.32986495483"
## [1] "In 2020, the predicted fishing hours is 3379.54718696408"
## [1] "In 2020, the predicted fishing hours is 6397.77266508881"
## [1] "In 2020, the predicted fishing hours is 16569.4786103501"
## [1] "In 2020, the predicted fishing hours is -2010.17766169026"
## [1] "In 2020, the predicted fishing hours is -3999.65042909331"
## [1] "In 2020, the predicted fishing hours is 8357.22542328771"
## [1] "In 2020, the predicted fishing hours is 9122.82605552599"
## [1] "In 2020, the predicted fishing hours is 1434.38606512835"
## [1] "In 2020, the predicted fishing hours is 2226.05861345844"
## [1] "In 2020, the predicted fishing hours is -5543.81420395741"
## [1] "In 2020, the predicted fishing hours is -15976.6181415525"
## [1] "In 2020, the predicted fishing hours is -1632.90774292241"
## [1] "In 2020, the predicted fishing hours is 4788.65453576867"
## [1] "In 2020, the predicted fishing hours is -5481.20683713852"
## [1] "In 2020, the predicted fishing hours is 7470.46279661342"
## [1] "In 2020, the predicted fishing hours is -796.296884629973"
## [1] "In 2020, the predicted fishing hours is 11036.4979619368"
## [1] "In 2020, the predicted fishing hours is 3571.41109589051"
## [1] "In 2020, the predicted fishing hours is -6074.04241411181"
## [1] "In 2020, the predicted fishing hours is -822.531682937372"
## [1] "In 2020, the predicted fishing hours is 179.001142417618"
## [1] "In 2020, the predicted fishing hours is 21416.9099969559"
## [1] "In 2020, the predicted fishing hours is -2082.47322266226"
## [1] "In 2020, the predicted fishing hours is 11122.5783709424"
## [1] "In 2020, the predicted fishing hours is 76.4612603436158"
## [1] "In 2020, the predicted fishing hours is 3842.55706445384"
## [1] "In 2020, the predicted fishing hours is 984.294894057463"
## [1] "In 2020, the predicted fishing hours is -43059.1179482498"
## [1] "In 2020, the predicted fishing hours is 3644.29901302408"
## [1] "In 2020, the predicted fishing hours is -5775.80776451778"
## [1] "In 2020, the predicted fishing hours is 12866.5349689352"
## [1] "In 2020, the predicted fishing hours is 693.458215405196"
# figure out which countries were used in the for loop so we can get the actual 2020 effort data for those countries only
countries_clean_unique <- countries_clean %>% 
  group_by(flag_gfw) %>%
  slice_head(n = 1)

# set these countries as a vector so we can subset the effort_trends data to only include those countries
countries_to_compare <- unique(countries_clean_unique$flag_gfw)
countries_to_compare
##   [1] "AFG" "AGO" "ALB" "ARG" "AUS" "BEL" "BGR" "BHR" "BLZ" "BRA" "CAN" "CHL"
##  [13] "CHN" "CIV" "CMR" "COK" "COL" "CPV" "CUW" "CYP" "DEU" "DNK" "DZA" "ECU"
##  [25] "ESP" "EST" "FIN" "FJI" "FLK" "FRA" "FRO" "FSM" "GBR" "GEO" "GHA" "GIN"
##  [37] "GNB" "GNQ" "GRC" "GRL" "GTM" "HKG" "HND" "HRV" "IDN" "IND" "IRL" "IRN"
##  [49] "ISL" "ISR" "ITA" "JPN" "KEN" "KIR" "KNA" "KOR" "LBR" "LBY" "LKA" "LTU"
##  [61] "LVA" "MAR" "MEX" "MHL" "MLT" "MNE" "MOZ" "MRT" "MUS" "MYS" "NAM" "NCL"
##  [73] "NGA" "NIC" "NLD" "NOR" "NRU" "NZL" "PAN" "PER" "PHL" "PNG" "POL" "PRT"
##  [85] "PYF" "QAT" "REU" "ROU" "RUS" "SAU" "SEN" "SGP" "SHN" "SLB" "SLV" "SPM"
##  [97] "SVN" "SWE" "SYC" "TCA" "THA" "TUN" "TUR" "TUV" "TWN" "UKR" "UNK" "URY"
## [109] "USA" "VCT" "VEN" "VNM" "VUT" "ZAF"
# ensure that there are the same number of rows (countries) in both datasets
nrow(countries_clean_unique)
## [1] 114
nrow(prediction_data)
## [1] 114
# set the effort trends data to only include those countries
comparison_2020 <- effort_trends %>% 
  select(flag_gfw, "2020") %>%
  rename(actual_2020 = "2020") %>% 
  filter(str_detect(flag_gfw, paste(countries_to_compare, collapse="|"))) %>% 
  cbind(prediction_data) %>% 
  rename(prediction_2020 = prediction_data) %>% 
  filter(actual_2020 != "NaN")
# I made sure to remove the NAN values from the countries that did not have actual data for 2020 AFTER I USED cbind() because I wanted to bind the actual 2020 data to the corresponding rows with the predicted data first or else the alignment would yield incorrect data

# remove all negative values in the predicted column, the linear regression did not fit this data well
comparison_2020_pos <- comparison_2020 %>% 
  filter(prediction_2020 >= 0)

# take the difference between the actual and the predicted columns
comparison_2020_pos <- comparison_2020_pos %>% 
  mutate(difference = actual_2020 - prediction_2020) %>% 
  mutate(change_direc = case_when(
    difference < 0 ~ "fished LESS than trend",
    difference > 0 ~ "fished MORE than trend"))

Plotting Actual Fishing Effort versus Predicted Fishing Effort for Malaysia

What does a single country’s fishing trend look like? Let’s consider the country of Malaysia in Southeast Asia. In 2015, Malaysia’s fisheries sector employed 175,980 people and its contribution to national gross domestic product was 1.1%. The fish trade is valued at $1.7 billion (U.S. dollars), and the estimated average consumption of fish is 56.8 kg/person/year. Malaysian fisheries primarily capture shrimp, squid, and fish. Malaysia contributes to the global fish economy through both importing and exporting fish (6)

We can make a country-specific fishing effort plot by filtering our actual fishing effort data to just that country, appending the predicted 2020 fishing effort data for that country that we produced through our linear model.

# filter fishing effort and yearly means from the original data 

trends_mys <- effort_trends %>%
  select(flag_gfw, "2017":"2020") %>% 
  pivot_longer(cols = ("2017":"2020"),
               names_to = "year",
               values_to = "mean_effort") %>% 
  filter(!is.na(mean_effort),
         !is.na(flag_gfw))

month_day <- "-01-01"
trends_mys <- trends_mys %>% 
  mutate(year = paste0(year, month_day))

# remove those countries from the dataframe
mys_countries_clean <- trends_mys %>% 
  group_by(flag_gfw) %>%
  filter(n()>2,
         flag_gfw == "MYS") %>% 
  mutate(year = as.Date(year, format = "%Y-%m-%d")) %>% 
  rename(actual_mean_effort = mean_effort)

# add the predicted values for ARG
mys_countries_clean$prediction_2020 <- c(912.1514, 735.6150, 910.6168, 889.64468)

# now make a linear model on the actual and the predicted data

mys_model_actual = lm(actual_mean_effort ~ year, data = mys_countries_clean)
summary(mys_model_actual)
## 
## Call:
## lm(formula = actual_mean_effort ~ year, data = mys_countries_clean)
## 
## Residuals:
##       1       2       3       4 
##   72.71 -124.24   30.36   21.17 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) -120.33217 2281.69913  -0.053    0.963
## year           0.05591    0.12877   0.434    0.707
## 
## Residual standard error: 105.1 on 2 degrees of freedom
## Multiple R-squared:  0.08613,    Adjusted R-squared:  -0.3708 
## F-statistic: 0.1885 on 1 and 2 DF,  p-value: 0.7065
mys_model_predicted = lm(prediction_2020 ~ year, data = mys_countries_clean)
summary(mys_model_predicted)
## 
## Call:
## lm(formula = prediction_2020 ~ year, data = mys_countries_clean)
## 
## Residuals:
##       1       2       3       4 
##   66.27 -121.02   43.24   11.52 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.404e+02  2.227e+03   0.153    0.893
## year        2.945e-02  1.257e-01   0.234    0.837
## 
## Residual standard error: 102.6 on 2 degrees of freedom
## Multiple R-squared:  0.02672,    Adjusted R-squared:  -0.4599 
## F-statistic: 0.05491 on 1 and 2 DF,  p-value: 0.8365
# adjust the min and max values for the y-axis so that they are multiples of 10 and encompass all the mean_effort numbers, multiples of 10 are easier for the reader to comprehend quickly
max_y_mys = round(max(mys_countries_clean$actual_mean_effort+8), 0)
max_y_mys
## [1] 930
min_y_mys = round(min(mys_countries_clean$actual_mean_effort-16), 0)
min_y_mys
## [1] 720
# actual data = firebrick
# predicted data = forestgreen

mys_plot <- ggplot() +
   geom_point(data = mys_countries_clean,
              aes(x = year, y = actual_mean_effort, color = "brown1"),
              size = 9,
              shape = 18) +
   geom_line(data = augment(mys_model_actual),
             aes(x = year, y = .fitted, color = "brown1"),
             size = 2) + 
   geom_point(data = mys_countries_clean,
              aes(x = year, y = prediction_2020, color = "cyan3"),
              size = 9,
              shape = 18) +
   geom_line(data = augment(mys_model_predicted),
             aes(x = year, y = .fitted, color = "cyan3"),
             size = 2) +
   scale_x_date(date_labels = "%Y",
                date_breaks = "1 year") +
   ggtitle("Malaysia's Fishing Effort: Actual vs. Predicted 2017-2020") +
   xlab("Year") + 
   ylab("Mean Fishing Hours") +
   theme(panel.background = element_blank(),
         axis.title.x = element_text(color = "black", size = 17),
         axis.text.x = element_text(face = "bold", color = "black", size = 15),
         axis.title.y = element_text(color = "black", size = 17),
         axis.text.y = element_text(face = "bold", color = "black", size = 12),
         plot.title = element_text(color="black", size = 17, face = "bold"),
         panel.border = element_rect(colour = "black", fill = NA, size = 2),
         legend.position = "right") +
   scale_y_continuous(breaks = seq(min_y_mys, max_y_mys, by = 20)) +
   scale_color_discrete(name = "Data Type", labels = c("Actual Fishing Effort", "Predicted Fishing Effort"))

mys_plot

Malaysia’s Fishing Effort: Actual vs Predicted 2017-2020

Malaysia increased their fishing effort in 2020 relative to their trend leading up to 2020. Malaysia’s fishing effort was approximately 922 hours, while our model predicted that this country would fish for approximately 890 hours. This is a small margin.

Statistical Significance

It’s time to run a t-test to determine if there is a statistical difference between the countries’ predicted fishing effort in 2020 and their actual fishing effort in 2020. A t-test is a handy tool in statistics that reveals how significant the differences between groups are. If the difference between the means of two groups could have easily happened by chance, the p-value will be greater than 0.05 (which is the standard threshold in statistics and environmental data science). If it is highly unlikely (less than a 5% chance) that a difference in means at least this extreme could have occurred by chance, the p-value is less than 0.05 and the results are considered statistically significant. A statistically significant outcome allows us to reject our null hypothesis.

Null Hypothesis: There is no difference between the predicted country-specific predicted fishing effort in 2020 and the actual country-specific fishing effort in 2020. \[H_{0}: \mu_{predicted} - \mu_{actual} = 0\] Alternative Hypothesis: There is a difference between the predicted country-specific predicted fishing effort in 2020 and the actual country-specific fishing effort in 2020. Because of the pandemic in 2020, I predict that fishing effort decreased, meaning that the actual country-specific fishing effort is less than the predicted country-specific fishing effort. \[H_{A}: \mu_{predicted} - \mu_{actual} \neq 0\]

Don’t forget to convert the data to Tidy format so we can run the t-test!

comparison_tidy <- comparison_2020_pos %>% 
  pivot_longer(cols = ("actual_2020":"prediction_2020"),
               names_to = "actual_or_predicted",
               values_to = "mean_effort")

# include this setting so the tiny p-value is not in scientific notation
options(scipen = 999)
ttest = t.test(mean_effort ~ actual_or_predicted, data = comparison_tidy, conf.level = 0.95)

t-test output

The p-value is 0.0000000312, and 0.0000000312 < 0.05, so we can reject our null hypothesis that there is no difference between the predicted country-specific predicted fishing effort in 2020 and the actual country-specific fishing effort in 2020. Many countries clearly changed their fishing effort in 2020 relative to their historical trend!

Summary: Which countries increased their fishing effort during the pandemic, relative to their trend leading up to 2020?

To best visualize this fishing effort data in a table, we can color code the countries that increased their fishing effort as red and color the countries that decreased their fishing effort in green.

# convert the comparison_tidy data to a table

# first, rearrange the columns
comparison_data_rearranged <- comparison_tidy[, c(1, 5, 4, 2, 3)]

# reduce the amount of rows to 1 per country, and select just the rows of interest for the table
delete <- seq(1, nrow(comparison_data_rearranged), 2)
comparison_rearranged_simplified <- comparison_data_rearranged[ delete ,]
comparison_rearranged_simplified <- comparison_rearranged_simplified %>% 
  select(flag_gfw, difference, change_direc)

good_bad_table <- comparison_rearranged_simplified %>% 
  gt() %>%
  tab_header(
    title = md("**Which countries increased or decreased 2020 fishing effort relative to their trend?**")
  ) %>%
  fmt_passthrough(
    columns = c(flag_gfw)
  ) %>%
  fmt_number(
  columns = c(difference)
  ) %>%
  fmt_passthrough(
    columns = c(change_direc)
  ) %>%
  cols_label(flag_gfw = "Country Code" , 
           difference = "Difference: Prediction - Actual",
           change_direc = "Fishing Effort Relative to Trend") %>% 
  tab_style(
    style = list(
      cell_fill(color = "chartreuse2"),
      cell_text(weight = "bold")
      ),
    locations = cells_body(
      columns = c(flag_gfw, difference, change_direc),
      rows = change_direc == "fished LESS than trend")
  ) %>% 
  tab_style(
    style = list(
      cell_fill(color = "brown2"),
      cell_text(weight = "bold")
      ),
    locations = cells_body(
      columns = c(flag_gfw, difference, change_direc),
      rows = change_direc == "fished MORE than trend")
  ) %>% 
  tab_source_note(source_note = "Data Source: Global Fishing Watch: https://globalfishingwatch.org/datasets-and-code/") %>%
  opt_align_table_header(align = "center") %>% 
  cols_width(
    flag_gfw ~ px(150),
    difference ~ px(150),
    change_direc ~ px(220)
  ) %>% 
  cols_align(align = "center")

good_bad_table
Which countries increased or decreased 2020 fishing effort relative to their trend?
Country Code Difference: Prediction - Actual Fishing Effort Relative to Trend
AGO −18,839.60 fished LESS than trend
ARG −1,589.04 fished LESS than trend
AUS −2,912.65 fished LESS than trend
BEL 1,957.92 fished MORE than trend
BHR −1,096.14 fished LESS than trend
CHN 483.96 fished MORE than trend
CIV 1,716.01 fished MORE than trend
COL −7,640.26 fished LESS than trend
CUW −679.30 fished LESS than trend
CYP −893.07 fished LESS than trend
DEU −1,132.22 fished LESS than trend
DNK −1,138.36 fished LESS than trend
DZA −966.74 fished LESS than trend
FIN −1,181.76 fished LESS than trend
FLK −8,901.31 fished LESS than trend
FRA −3,911.13 fished LESS than trend
FRO −17,991.09 fished LESS than trend
FSM −10,412.30 fished LESS than trend
GBR 696.23 fished MORE than trend
GNB −2,829.85 fished LESS than trend
GRC −1,090.50 fished LESS than trend
GRL 1,740.20 fished MORE than trend
GTM −9,408.47 fished LESS than trend
HKG −1,135.99 fished LESS than trend
HND 499.66 fished MORE than trend
HRV −1,312.42 fished LESS than trend
IND −2,115.41 fished LESS than trend
IRL −1,781.87 fished LESS than trend
IRN −10,708.15 fished LESS than trend
ITA −455.14 fished LESS than trend
JPN −7,306.61 fished LESS than trend
KIR −22,066.68 fished LESS than trend
KOR −1,961.90 fished LESS than trend
LBR −3,835.59 fished LESS than trend
LTU −965.70 fished LESS than trend
LVA −3,866.25 fished LESS than trend
MEX −183.88 fished LESS than trend
MHL −4,459.09 fished LESS than trend
MLT −3,403.21 fished LESS than trend
MOZ −30,463.68 fished LESS than trend
MRT −3,967.54 fished LESS than trend
MUS −4,556.00 fished LESS than trend
MYS 32.19 fished MORE than trend
NCL −16,767.26 fished LESS than trend
NIC −12,025.34 fished LESS than trend
NLD −5,488.85 fished LESS than trend
NOR −637.39 fished LESS than trend
NRU −23,059.46 fished LESS than trend
NZL −3,033.13 fished LESS than trend
PHL −4,969.43 fished LESS than trend
PNG −6,850.56 fished LESS than trend
POL −2,654.60 fished LESS than trend
PRT −5,154.99 fished LESS than trend
PYF −14,175.81 fished LESS than trend
ROU −7,758.79 fished LESS than trend
RUS −6,501.55 fished LESS than trend
SAU −1,057.84 fished LESS than trend
SEN 1,264.19 fished MORE than trend
SLV −4,308.69 fished LESS than trend
SVN −6,581.25 fished LESS than trend
SYC −7,965.57 fished LESS than trend
TCA −2,742.51 fished LESS than trend
TUR 235.48 fished MORE than trend
TUV −20,921.43 fished LESS than trend
UKR −9,779.68 fished LESS than trend
UNK 327.21 fished MORE than trend
URY −1,521.12 fished LESS than trend
USA −148.03 fished LESS than trend
VEN −2,640.55 fished LESS than trend
VUT −9,212.82 fished LESS than trend
ZAF 872.42 fished MORE than trend
Data Source: Global Fishing Watch: https://globalfishingwatch.org/datasets-and-code/

Differences between actual 2020 fishing effort and predicted 2020 fishing effort by country

This color-coded table reveals that 85% of the countries included in this analysis decreased their fishing effort during the Covid-19 pandemic in 2020 relative to their fishing trend leading up to 2020, while 15% of the countries included in this analysis increased their fishing effort. The vast majority of countries’ fishing sectors seemed to follow the same stay-at-home order that was enforced across the globe. While this may have had a detrimental impact on the global fish economy, hopefully the marine fish populations we able to recover and thrive during this period of reprieve from predation. The results of my statistical analysis match the conclusion of a 2021 scientific study investigating the change in marine recreational fishing activity during the first year of the pandemic (5).

Future Steps

In order to make this analysis more robust in the future, I recommend using more than 3 years of fishing effort data to produce a more accurate linear model. Additionally, I would recommend using a different statistical approach instead of iterating a for loop over each country’s fishing effort data, because this method did not produce a very accurate linear regression. Lastly, I recommend running this analysis on fishing effort data from other sources in addition to Global Fishing Watch’s data. This will provide certainty that the data is accurate and the results are reproducible.

Thank you for reading my statistical review of global fishing effort during the 2020 Covid-19 pandemic. I hope you have been inspired to run your own linear regressions, t-tests, and create visualizations that help communicate trends in environmental data science. Please feel free to contact me at with any questions, comments, or suggestions. You may also create issues or pull requests for this analysis through GitHub (repository linked below).

Data Availability

The data used in this analysis is openly available, but the user must make a free account on the Global Fishing watch website, which can be accessed through this live link:
Global Fishing Watch Datasets and Code

Acknowledgements:

  • I would like to acknowledge Dr. Tamma Carleton, my professor in Statistics for Environmental Data Science at the U.C. Santa Barbara Bren School for Environmental Science and Management, for all her support throughout this project and this quarter.
  • I would also like to thank my peers in the Master of Environmental Data Science Program for being so open to collaboration and supporting each other with resources, programming tools, and open-source science.
  • Lastly, I would like to thank Global Fishing Watch for inspiring me to give a hoot about global fishing effort by country, and for providing the data that made this project possible.